Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][NATIVECPU][LIBCLC] Use libclc for SYCL Native CPU #10970

Merged
merged 35 commits into from
Sep 7, 2023

Conversation

PietroGhg
Copy link
Contributor

This PR allows linking to libclc when compiling for SYCL Native CPU. Currently only the x86_64-unknown-linux-gnu target triple is supported, additional target triples (and possibly a more versatile way of setting them) will come with follow up PRs.
Some useful information for reviewing:

  • We start using an AddrSpaceMap (set in TargetInfo.cpp) because the mangled names emitted by the device compiler need to match with the names provided by libclc. The AddressSpaceMap is taken from the PTX Target.
  • Changes in Driver are needed to find and link to libclc.
  • libclc/ptx-nvidiacl/libspirv/atomic/loadstore_helpers.ll has been split into 4 modules, one for each memory ordering constraint. Copies of these modules have been added in generic (because some functions in generic/libspirv/atomic needed them), and the module split allows to specialize the file for targets that may not support some orderings. Currently only a couple of function for acquire and seq_cst have been implemented for generic, but the others will be implemented in a follow up PR.
  • We've added a target in libclc for x86_64-unknown-linux. This has been done because some math builtins in generic have been defined as
typedef char vec __attribute__((ext_vector_type(8)));
__attribute__((overloadable)) vec __clc_native_popcount(vec x) __asm("llvm.ctpop" ".v16i" "8");

vec call(vec x) {
  return __clc_native_popcount(x);
}

While this approach conveniently allows to call directly LLVM intrinsics, it does seem to play well with the ABI for x86_64-unknown-linux, since it leads to this IR:

define dso_local double @call(double noundef %x.coerce) #0 {
entry:
  %0 = bitcast double %x.coerce to <8 x i8>
  %1 = bitcast <8 x i8> %0 to double
  %call = call double @llvm.ctpop.v8i8(double noundef %1) #8
  %2 = bitcast double %call to <8 x i8>
  %3 = bitcast <8 x i8> %2 to double
  ret double %3
}

Which is invalid because lvm.ctpop.v8i8 expect a vector of i8 and not a double, leading to failing asserts in the compiler that prevented from building libclc.

As a temporary work around we have added empty files that override the files in generic when building for x86_64-unknown-linux, allowing to complete the build, even though the corresponding builtins will be missing from the library. We are working on a proper solution for this.

@PietroGhg PietroGhg requested review from a team as code owners August 25, 2023 13:42
@PietroGhg PietroGhg requested review from a team as code owners August 25, 2023 13:42
Copy link
Contributor

@jandres742 jandres742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on common code

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM

@PietroGhg
Copy link
Contributor Author

Hello @intel/dpcpp-devops-reviewers @intel/llvm-reviewers-runtime @intel/llvm-reviewers-cuda @intel/dpcpp-clang-driver-reviewers, can I get some feedback on this PR? Thank you :)

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, but why is it we need to add all those empty .cl files? Does LIBCLC assume their existence for all targets?

@@ -181,6 +181,8 @@ set( nvptx--nvidiacl_devices none )
set( nvptx64--nvidiacl_devices none )
set( spirv-mesa3d-_devices none )
set( spirv64-mesa3d-_devices none )
# todo: does this need to be set for each possible triple?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the comment formatting used throughout the rest of the file. I.e.

Suggested change
# todo: does this need to be set for each possible triple?
# TODO: Does this need to be set for each possible triple?

@PietroGhg
Copy link
Contributor Author

Overall it looks good, but why is it we need to add all those empty .cl files? Does LIBCLC assume their existence for all targets?

Yes, my understanding is that the libclc CMake config falls back to the generic version of those files if they are not found in the target-specific folders, so if we were to remove those files right now we would have compiler errors when building the library for x86, because it would pick up the files from generic.
There are probably better ways of handling this than providing empty files, but this allows us to temporarily disable those files for x86 without making changes to libclc's CMake (changes that we would undo anyway once we provide implementations for those functions).

@steffenlarsen
Copy link
Contributor

I agree, this is probably the least intrusive solution to the imposed problem. Thanks!

clang/lib/Driver/Driver.cpp Outdated Show resolved Hide resolved
clang/lib/Driver/Driver.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes OK for me!

@PietroGhg
Copy link
Contributor Author

Hello @intel/llvm-gatekeepers I think this is ready to be merged, thanks :)

@steffenlarsen steffenlarsen merged commit 6da4d2e into intel:sycl Sep 7, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants